LearningPinocchio: adaptive information extraction for real world applications

نویسندگان

  • Fabio Ciravegna
  • Alberto Lavelli
چکیده

The new frontier of research on Information Extraction from texts is portability without any knowledge of Natural Language Processing. The market potential is very large in principle, provided that a suitable easy-to-use and effective methodology is provided. In this paper we describe LearningPinocchio, a system for adaptive Information Extraction from texts that is having good commercial and scientific success. Real world applications have been built and evaluation licenses have been released to external companies for application development. In this paper we outline the basic algorithm behind the scenes and present a number of applications developed with LearningPinocchio. Then we report about an evaluation performed by an independent company. Finally we discuss the general suitability of this IE technology for real world applications and draw some conclusion.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to Tag for Information Extraction from Text

LearningPINOCCHIO is an algorithm for adaptive information extraction. It learns template filling rules that insert SGML tags into texts. LearningPINOCCHIO is based on a covering algorithm that learns rules by bottom-up generalization of instances in a tagged corpus. It has been tested on three scenarios in informal domains in two languages (Italian and English). Experiments report excellent re...

متن کامل

Fingerprint Core and Delta Detection by Candidate Analysis

In many real-world applications such as face recognition and mobile robotics, we need to use an adaptive version of feature extraction techniques. In this paper, we introduce an adaptive face recognition system based on PCA algorithm. We combine Sanger’s adaptive algorithm for computation of effective eigenvectors with QR decomposition algorithm where used to estimate associated eigenvalues. By...

متن کامل

A review of agent-based modeling (ABM) concepts and some of its main applications in management science

We live in a very complex world where we face complex phenomena such as social norms and new technologies. To deal with such phenomena, social scientists often use reductionism approach where they reduce them to some lower-lever variables and model the relationships among them through a scheme of equations. This approach that is called equation based modeling (EBM) has some basic weaknesses in ...

متن کامل

Adaptive Information Extraction from Text by Rule Induction and Generalisation

(LP) 2 is a covering algorithm for adaptive Information Extraction from text (IE). It induces symbolic rules that insert SGML tags into texts by learning from examples found in a user-defined tagged corpus. Training is performed in two steps: initially a set of tagging rules is learned; then additional rules are induced to correct mistakes and imprecision in tagging. Induction is performed by b...

متن کامل

Target Tracking with Unknown Maneuvers Using Adaptive Parameter Estimation in Wireless Sensor Networks

Abstract- Tracking a target which is sensed by a collection of randomly deployed, limited-capacity, and short-ranged sensors is a tricky problem and, yet applicable to the empirical world. In this paper, this challenge has been addressed a by introducing a nested algorithm to track a maneuvering target entering the sensor field. In the proposed nested algorithm, different modules are to fulfill...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Natural Language Engineering

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2004